Performance evaluation of MapReduce using full virtualisation on a departmental cloud
نویسندگان
چکیده
This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1+16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced using Hadoop on this virtualisation platform on a departmental cloud. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.
منابع مشابه
Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملFlame-MR: An event-driven architecture for MapReduce applications
Nowadays, many organizations analyze their data with the MapReduce paradigm, most of them using the popular Apache Hadoop framework. As the data size managed by MapReduce applications is steadily increasing, the need for improving the Hadoop performance also grows. Existing modifications of Hadoop (e.g., Mellanox Unstructured Data Accelerator) attempt to improve performance by changing some of ...
متن کاملP2P-MapReduce: Parallel data processing in dynamic Cloud environments
MapReduce is a programming model for parallel data processing widely used in Cloud computing environments. Current MapReduce implementations are based on centralized master-slave architectures that do not cope well with dynamic Cloud infrastructures, like a Cloud of clouds, in which nodes may join and leave the network at high rates. We have designed an adaptive MapReduce framework, called P2P-...
متن کاملExperimental Evaluation of Multi-Round Matrix Multiplication on MapReduce
This paper proposes an Hadoop library, named M3, for performing dense and sparse matrix multiplication in MapReduce. The library features multi-round MapReduce algorithms that allow to tradeoff round number with the amount of data shuffled in each round and the amount of memory required by reduce functions. We claim that multi-round MapReduce algorithms are preferable in cloud settings to tradi...
متن کاملPerformance Evaluation of MapReduce Applications on Cloud Computing Environment, FutureGrid
This paper describes the result of performance evaluation of two kinds of MapReduce applications running in the FutureGrid: a data intensive application and a computation intensive application. For this work, we construct a virtualized cluster system made of a set of VM instances. We observe that the overall performance of a data intensive application is strongly affected by the configuration o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Applied Mathematics and Computer Science
دوره 21 شماره
صفحات -
تاریخ انتشار 2011